在偏置数据集上培训的分类模型通常在分发外部的外部样本上表现不佳,因为偏置的表示嵌入到模型中。最近,已经提出了各种脱叠方法来解除偏见的表示,但仅丢弃偏见的特征是具有挑战性的,而不会改变其他相关信息。在本文中,我们提出了一种新的扩展方法,该方法使用不同标记图像的纹理表示明确地生成附加图像来放大训练数据集,并在训练分类器时减轻偏差效果。每个新的生成图像包含来自源图像的类似内容信息,同时从具有不同标签的目标图像传送纹理。我们的模型包括纹理共发生损耗,该损耗确定生成的图像的纹理是否与目标的纹理类似,以及确定所生成和源图像之间的内容细节是否保留的内容细节的空间自相似性丢失。生成和原始训练图像都进一步用于训练能够改善抗偏置表示的鲁棒性的分类器。我们使用具有已知偏差的五个不同的人工设计数据集来展示我们的方法缓解偏差信息的能力。对于所有情况,我们的方法表现优于现有的现有最先进的方法。代码可用:https://github.com/myeongkyunkang/i2i4debias
translated by 谷歌翻译
Cartoonization is a task that renders natural photos into cartoon styles. Previous deep cartoonization methods only have focused on end-to-end translation, which may hinder editability. Instead, we propose a novel solution with editing features of texture and color based on the cartoon creation process. To do that, we design a model architecture to have separate decoders, texture and color, to decouple these attributes. In the texture decoder, we propose a texture controller, which enables a user to control stroke style and abstraction to generate diverse cartoon textures. We also introduce an HSV color augmentation to induce the networks to generate diverse and controllable color translation. To the best of our knowledge, our work is the first deep approach to control the cartoonization at inference while showing profound quality improvement over to baselines.
translated by 谷歌翻译
非本地(NL)块是一个流行的模块,它展示了模拟全局上下文的功能。但是,NL块通常具有沉重的计算和记忆成本,因此将块应用于高分辨率特征图是不切实际的。在本文中,为了研究NL块的功效,我们经验分析了输入特征向量的大小和方向是否正确影响向量之间的注意力。结果表明,SoftMax操作的效率低下,该操作通常用于将NL块的注意力图归一化。通过软磁性操作归一化的注意力图极大地依赖于关键向量的大小,并且如果删除幅度信息,则性能将退化。通过用缩放系数替换SoftMax操作,我们证明了CIFAR-10,CIFAR-100和TININE-IMAGENET的性能提高。此外,我们的方法显示了嵌入通道减少和嵌入重量初始化的鲁棒性。值得注意的是,我们的方法在没有额外的计算成本的情况下使多头注意力可用。
translated by 谷歌翻译
视觉变压器(VIT)是计算机视野领域的主导模型。尽管大量研究主要关注处理归纳偏见和复杂性,但仍然存在找到更好的变压器网络的问题。例如,传统的基于变压器的模型通常使用每个查询(Q),键(k)和嵌入多头自我关注之前的键(k)和值(v)的投影层。对语义$ Q,K $和$ V $嵌入不充分考虑可能导致性能下降。在本文中,我们提出了3种$ Q $,k $和$ v $嵌入的三种类型的结构。第一个结构利用两个具有Relu的层,这是$ q,k $和$ v $的非线性嵌入。第二个涉及共享一个非线性层,以在$ q,k $和$ v $之间分享知识。第三种结构与代码参数共享所有非线性层。代码是培训的,值确定要在$ q $,$ k $和$ v $之间执行的嵌入过程。因此,与几种最先进的方法相比,我们展示了实验中提出的方法的优越图像分类性能。该方法在ImageNet-1K数据集中实现了71.4 \%$ 71.4 \%$ 71.4 \%$ xcit-n12的原始变压器模型所需的少数参数(3.1m $)($ 69.9 \%$)。此外,该方法达到了93.3 \%$ 29m $ 5.290万$参数,平均为CIFAR-10,CIFAR-100,斯坦福汽车数据集和STL-10数据集比为92.2 \%的准确性更好通过原始XCIT-N12模型获得$。
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译
This is paper for the smooth function approximation by neural networks (NN). Mathematical or physical functions can be replaced by NN models through regression. In this study, we get NNs that generate highly accurate and highly smooth function, which only comprised of a few weight parameters, through discussing a few topics about regression. First, we reinterpret inside of NNs for regression; consequently, we propose a new activation function--integrated sigmoid linear unit (ISLU). Then special charateristics of metadata for regression, which is different from other data like image or sound, is discussed for improving the performance of neural networks. Finally, the one of a simple hierarchical NN that generate models substituting mathematical function is presented, and the new batch concept ``meta-batch" which improves the performance of NN several times more is introduced. The new activation function, meta-batch method, features of numerical data, meta-augmentation with metaparameters, and a structure of NN generating a compact multi-layer perceptron(MLP) are essential in this study.
translated by 谷歌翻译
Detecting abrupt changes in data distribution is one of the most significant tasks in streaming data analysis. Although many unsupervised Change-Point Detection (CPD) methods have been proposed recently to identify those changes, they still suffer from missing subtle changes, poor scalability, or/and sensitive to noise points. To meet these challenges, we are the first to generalise the CPD problem as a special case of the Change-Interval Detection (CID) problem. Then we propose a CID method, named iCID, based on a recent Isolation Distributional Kernel (IDK). iCID identifies the change interval if there is a high dissimilarity score between two non-homogeneous temporal adjacent intervals. The data-dependent property and finite feature map of IDK enabled iCID to efficiently identify various types of change points in data streams with the tolerance of noise points. Moreover, the proposed online and offline versions of iCID have the ability to optimise key parameter settings. The effectiveness and efficiency of iCID have been systematically verified on both synthetic and real-world datasets.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译
Deep learning has been widely used for protein engineering. However, it is limited by the lack of sufficient experimental data to train an accurate model for predicting the functional fitness of high-order mutants. Here, we develop SESNet, a supervised deep-learning model to predict the fitness for protein mutants by leveraging both sequence and structure information, and exploiting attention mechanism. Our model integrates local evolutionary context from homologous sequences, the global evolutionary context encoding rich semantic from the universal protein sequence space and the structure information accounting for the microenvironment around each residue in a protein. We show that SESNet outperforms state-of-the-art models for predicting the sequence-function relationship on 26 deep mutational scanning datasets. More importantly, we propose a data augmentation strategy by leveraging the data from unsupervised models to pre-train our model. After that, our model can achieve strikingly high accuracy in prediction of the fitness of protein mutants, especially for the higher order variants (> 4 mutation sites), when finetuned by using only a small number of experimental mutation data (<50). The strategy proposed is of great practical value as the required experimental effort, i.e., producing a few tens of experimental mutation data on a given protein, is generally affordable by an ordinary biochemical group and can be applied on almost any protein.
translated by 谷歌翻译